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Abstract 



The standard two-variable chi-square test is typically consistent for all alternatives to inde- 
pendence, but effectively treats the data as nominal which may lead to loss of power for ordinal 
P^ , data. Alternatively, a test based on Kendall's tau does take ordinality into account, but only 

C^ ' has power against a narrow set of alternatives. This paper introduces a new test aimed at filling 

(-H ^ this gap, i.e., it is designed for ordinal data and to have asymptotic power for all alternatives. 

Our test is a permutation test based on a modification of Kendall's tau, denoted r*, which is 
nonnegative and equal to zero if and only if and only if independence holds. An interpretation 
of r* in terms of concordance and discordance for sets of four observations is given. The new 
coefHcient is a sign version of a covariance introduced by Bergsma (2006). 

CO ' Keywords: measure of association, test of independence, concordance, discordance, sign test, 

^ [ ordinal data, permutation test, copula. 

^ ! 1 Introduction and overview of main results 

C^ I Sign covarianccs such as Kendall's tau (t) are especially useful for testing independence when (i) the 

data are ordinal (whether continuous or discontinuous) and the ordinary covariance is inappropriate, 
(ii) the data are heavy tailed and the ordinary covariance may not be defined, or (iii) the data are 
contaminated and robustness is needed. Kendall's tau (r), however, has the possible drawback that 
it may be zero when there is association present. To achieve power against broader alternatives, 
the chi-square test can be used; it is directly applicable to categorical data and can be used for 



JH I continuous data after a suitable categorization. However, the chi-square test for data with ordered 

outcomes does not take the ordinal nature of the data into account, leading to potential power 
loss for 'ordinal' alternatives; effectively the chi-square test treats the data as nominal rather 
than ordinal (see also Agresti, 2010). As a possible substitute for these two tests, we consider a 
modification of r^, which we call t*. Tests based on the sample value of t* yield power against a 
broad range of ordinal alternatives. Below, we first derive r*, then summarize its properties and 
provide a probabilistic interpretation in terms of concordance and discordance probabilities. 

Regarding notation, we denote iid sample values by {xi,yi), . . . ,{xn,yn), but will also use 
{{Xi,Yi)} to denote iid replications of {X,Y) in order to define population coefficients. For the 
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most part of this paper we assume all variables are real, but briefly touch upon more general metric 
sample spaces. The empirical value t of Kendall's tau is 



1 " 
— Y^ sign(a::j - Xj)sign{y^ - y^) 

and its population version is 






T = £;sign(Xi - X2) sign(yi - Fa) 

(Kruskal, 1958; Kendall & Gibbons, 1990). With 

s{zi,Z2,Z:i,Zi) = sign(2:i " Z3)(z2 - 2^4) 

= Sign(|zi - Z2p + \Z'i - Z4p - |zi - Z3p - |Z2 - ^4^) 

we obtain 



1 " 



n 

ij,k,l^l 

and 

r^ = Ss(Xi,X2,X3,X4)s(Yl,y2,>3,>"4) 

Replacing squared differences in s by absolute values of differences, we define 

a{zi,Z2,Z3,Z4) = sign(|zi - ^;2| + \z3 -24] - |zi - Z3I - |z2 - 24!) (1) 

This leads to a modified version of t^ , 

t* = Y a{xi,Xj,Xk,xi)a{yi,yj,yk,yi) 

i.j,k,l 

and the corresponding population coefficient 

T*=T*(X,y)-^a(Xi,X2,X3,X4)a(Yi,y2,>^3,n) 

The quantities t* and r* are new. and the main result of the paper is the following: 

Theorem 1 Let X and Y be real continuous random variables. Then t*{X,Y) > with equality 
if and only if X and Y are independent. 

The proof is given in Section 2. We conjecture that the continuity condition is not needed, and 
have the following partial proofs of this: (i) Lemma 1 in Section 3.1 shows that if X is binary and Y 
continuous or discrete, the theorem holds; (ii) the code for a computational 'proof of nonnegativity 
of T* using Mathematica for 3x3 contingency tables with given marginals is given in Appendix B. 
If the sign functions are omitted from r*, we obtain the covariance n introduced by Bergsma 
(2006). He showed that for arbitrary real random variables X and Y, k{X, Y) > with equality if 
and only if X and Y are independent. 



We now give a probabilistic interpretation of r*. A pair of points {{xi,yi), {x2,y2)} is called 
concordant if {xi — X2)(jji — 1/2) > and discordant if {xi — X2){yi — 2/2) < 0, as illustrated in 
Figure 1. Denoting the probabilities that two randomly chosen points are concordant by Ilca and 
that they are discordant by IId^ , Kendall's tau has the well-known probabilistic interpretation 

An analogous interpretation of r* can be given. A set of four points is concordant if there ex- 
ist vertical and horizontal axes such that two opposing open quadrants contain two points each 
(Figure 2(a)). The set is discordant if there exist vertical and horizontal axes such that every 
open quadrant contains a single point (Figure 2(b)). Note that the axes must strictly separate 
the points, i.e., no points can fall on the axes. In mathematical notation, a set of four points 
{(xi, j/i), . . . , {x4, 2/4)} is concordant if there is a permutation (i, j, fc, I) of (1, 2, 3, 4) such that 

{x^,XJ < Xk,xi) A [{yt,yj < yk,yi) V {y^,yJ > yk,yi)] 

and discordant if there is a permutation {i,j, fc, I) of (1, 2, 3, 4) such that 

[{xi,Xj < Xk,xi) V {xi,Xj > xk,xi)] A [{yi,yk < y^iVi)'^ {Vi^Vk > yj,yi)] 

It is straightforward to verify that 

a{zi,Z2,Z3,Z4) = I{zi,Z2 < Z3,Z4) + I{zi,Z2> Z3,Z4) 

-I{zi,Z3 < Z2,Zi) - 1(2:1,23 > Z2, Zi) 

where / is the indicator function and I{zi, Z2 < 2:3, 24) is shorthand for I{zi < 23 A zi < Z4 A Z2 < 
Z3 A 22 < 24). Hence, 

r* = 2P{Xi,X2<X3,XiAYi,Y2<Y3,Yi) + 
2P{Xi,X2 < X3, X4 A Yi, r2 > r3, ^4) - 
AP{Xi,X2<X3,X4AYi,Y3<Y2,Y4) (2) 

Denoting the probability that four randomly chosen points are concordant as IIc^ and the proba- 
bility that they are discordant as II^i^ , we obtain that the sum of the first two probabilities on the 
right hand side of (2) equal Ilci/O, while the last probability equals nD^/24. Hence, 

r* = ^"^-'"^^ (3) 

The reason that Tld is given twice as much weight as Hd^ is related to there being twice as many 
discordant as concordant patterns in Figure 2. It can be seen that t* and t* do not depend on 
the scale at which the variables are measured, but only on the ranks or grades of the observations. 
Four points arc said to be tied if they are neither concordant nor discordant. Clearly, for continuous 
distributions the probability of tied observations is zero. Hence, under independence, when all 
configurations arc equally likely, Hc^ ~ 1/3 and H^i^ = 2/3, and if one variable is a strictly 
monotone function of the other, then Tlci ~ 1 a-nd Hd^ ~ 0. 

The definition of t* can easily be extended to X and Y in arbitrary metric spaces, but un- 
fortunately Theorem 1 does not extend then, as it is possible that r* < 0. This is shown by the 



(a) Concordant pair (b) Discordant pair 

Figure 1: Concordant and discordant pairs of points associated with Kendall's tau 

following example. Consider a set of points {ui, . . . , wg} C R , where u,; = (u^i, . . . , Uis)' such that 
Uii = 3, Uij = —1 ii i ^ j and i,j < 4 or i,j > 5, and Uij = otherwise. Suppose Y is uniformly 
distributed on {0, 1}, and given Y = 0, X is uniformly distributed on iti, . . . , M4, and given Y = 1, 
X is uniformly distributed on U5, . . . , wg. Then r* = —1/64. 

Still, for X and Y in arbitrary metric spaces, the following result suggests that a test that t* = 
against the alternative r* > can have power for certain interesting alternatives: if {X, Y) is the 
mixture of a non-degenerate independence model and a point mass, then t*(X, Y) > (Theorem 2 
in Appendix A). 

By the Cauchy-Schwarz inequality, the normalized value 

T*{X,Y) 



^r*iX,X)T*{Y,Y) 

does not exceed one. (Note that this notation is in line with Kendall's Tf,, defined analogously.) 

Note that t*{X,Y) is a function of the copula, which is the joint distribution of Fx{X) and 
Fy{Y), where Fx and Fy are the cumulative distribution functions of X and Y. Nelsen (2006, 
Chapter 5) explores the way in which copulas can be used in the study of dependence between 
random variables, paying particular attention to Kendall's tau and Spearman's rho. 

The remainder of the paper is organized as follows. In Section 3, a comparison is given with 
some other approaches in the literature. In particular, the Cramer von Mises test is essentially a 
special case and there are interesting similarities with a test devised by Hoeffding. In Section 4 
we give a description of independence testing with an artificial and a real data example, briefly 
comparing the tests described in Section 3.2 with a test based on Kendall's tau and the chi-square 
test. 

2 Proof of Theorem 1 

For the proof of Theorem 1 , we need Lemma 1 , which covers the case that one of the variables is 
binary. Note that Lemma 1 provides an extension of Theorem 1, which does not cover the discrete 
case. 

Lemma 1 If X is binary and Y given X is continuous or discrete, then r* > with equality if 
and only if X and Y are independent. 

Before giving the proof, let us first look at the form of r* when one of the variables is binary. 
Suppose X e {0, 1} and r e R and denote U = (Y\X = 0), F = {Y\X = 1), and p = P{X = 0). 



(a) Concordant quadruples 
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(b) Discordant quadruples 
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Figure 2: Configurations of concordant and discordant quadruples of points associated with r*. 
The dotted axes indicate strict separation of points in different quadrants; within a quadrant, no 
restrictions apply. 



Then using (2) it is straightforward to verify that 

T* = 2p2(i _p)2 [P{U,,U2 < VuV2) + P{VuV2 < C/i, U2) - 2P(C/i, Fi < t/2, V2)] (4) 

Note that in this case, independence of X and Y is equivalent to U and V having identical distri- 
butions (see Section 3.1 for comments on the resulting two-sample test). 

Below, we first prove Lemma 1 separately for the continuous and the discrete case, then we 
prove Theorem 1. 

Proof of Lemma 1 (the continuous case): Continuity implies 

P{Ui,U2 < VuV2) + P{Vi,V2 < C/i, C/2) + 4P(C/i, Fi < U2, V2) = 1 
so by (4), T* > reduces to 

P(C/i,C/2 < Vi,V2) + P{Vi,V2 < Ui,U2) > i. 
Now, with G and H the distribution functions of U and V, respectively, 

P ([/i, f/2 <VuV2)+P (Vi, V2 < Ui,U2) = 

2 I G'^{l-H)dH + 2 I {I- Gf HdH = 

2 f {G^ + H~ 2HG) dH = 

2 f {H-H^)dH + 2 f {H - Gf dH >\ 
with equality if and only ii H = G. □ 

Proof of Lemma 1 (the discrete case): The lemma is true for random variables that take two 
values each (by inspection). 

Assume now it is true for random variables with k values. Let Ak be P{Ui,U2 < V^i,V2) -I- 
P (C/i, C/2 > Vi, V2) and Bk be P {Ui,Vi > C/2, ^^2) for some random variables with k values. 

Suppose now that U* has a mixed distribution: it has the same distribution as C/^with probability 
1 — a and takes a different value larger than all previous possible values of C/j with probability a. 

Similarly V* has a mixed distribution: it takes the same distribution as Vi with probability 
1 — /? and takes the different value mentioned before with probability /3. 

We now have 

P (U*, U; < V{,V2*) + P {U*,U; > V{, V2*) = a^P^Ak +p^{i- «') + a^ (1 - /J^ ) 

and 

P (C/i, Vi > c/2, V2) = a^p^Bk+aP (1 - ap) . 

Hence 

P (C/i*, U; < V{, V2*) + P (C/i*, c/2* > V,*,V2*) - 2P (C/i, Fi > C/2, Fa) = 

a^P^ {Ak - 2Bk) + /3^ (1 - a^) + a^ {l - p^) - 2a/3 (1 - aP) = 



But Ak > 2Bk with equality if and only if the distributions arc identical, so now for random vari- 
ables taking k + 1 values the statement is true with equality if and only if Ak = 2Bk and a = /?, 
that is, when the distributions are identical. D 

Proof of Theorem 1: We assume the distribution of {Xi, Yi) is continuous. We can see that we 
need to prove that 

P (Fi , ^2 <>3, YikXi ,X2<X3,Xi)+P (Yi ,Y2>Y3, YikXi , X2 < X3, X4) > 

This is because 

P (n. Fa < Y3, YikX,,X2 < X3, X4) + P{Y^,Y2> Y3, n&^i, ^2 < ^3, ^4) + 

AP{Yi,Y3>Y2,YikXi,X2<X3,Xi)^l. 
We now have that 

P (Fi , F2 < F3, n&^i ,X2<X3, Xi) +P{Yi,Y2>Y3, YikXi , X2 < X3, X4) = 
P (F < y\X = xi)P{Y <y\X = X2) (1 - P (F < j/|X > xi V 2:2)) • 
(P {X > xi V X2)f P{Y e dy\X > xi V X2) P {X £ dxi) P {X e rfxa) + 
2 f I f{l-P{Y<y\X = xi)) (1 - P (F < y\X = 2:2)) P {Y < y\X > xi V X2) ■ 

(P (X > xi V X2)f P(Y e dy\X > m V X2) P (X e dxi) P{X e dx2) = 

^ [ [ [ {^(^<yl^ = 2;i)P(F<y|X = a;2) + P(F<y|X>a;iVx2) 
-P (F < y\X = xi) P{Y < y\X > Xi V .T2) - P [Y < y\X = X2) P [Y < y\X > Xi V 0:2)} ■ 
(P (X > xi V a;2))^ PiY e dy\X > xi V X2) P (X £ dxi) P {X e dx2) = 
We now rewrite the quantity in brackets as 

( {P{Y < y\X > xi V X2) ~[P{Y < y\X > xi V 2:2))^ • 
(P {X>xi\J X2)f P{Y e dy\X > xi V X2) P{X e dxi)P {X e dx2) + 



(P (F < ylX > xi V X2) - P (F < y\X = xi)) (P (F < y\X > xi V Xs) - P (F < y|X = X2)) ■ 
(P (X > xi V X2))^ P{Y e dy\X > xi V X2) P (X e dxi) P {X e dx2) (5) 

7 



Now 

f [P{Y < y\X >xiV X2) -{PiY <y\X>xiV xa))'} P {Y e dy\X > a;i V X2) = ^ 

so the first of the two integrals in (5) is equal to 

^ / / (P (X > xi V X2)f P{Xe dxi)P {X e dx2) = ip {Xi,X2 < X3, X4) . 

It remains to show that the second integral is no n- negative (with for independenee). We have 

(P {Y < y\X >xiVx2)-PiY < y\X = xi)) (P {Y < y\X > xi V a^a) - P (F < y\X = X2)) ■ 

(P {X >xiV X2)f P{X e dxi) P{X e dx2) P{Y e dy\X >xiV X2) = 
{P{Y <y,X> xi V X2) PiX edxi)-P{Y <y,X e dxi) P {X > xiV X2)) ■ 

R3 

{P{Y <y,X>xiy X2) P{X e dx2) ~PiY <y,X e dx2) P (X > xi V X2)) P {Y e dy\X > xi V X2) ■ 
The integrand of the expression above has the same sign as 

(P (X > xi V X2\Y <y)P{X e dxi) -P{X e dxi\Y <y)P{X>xiW X2)) ■ 

(P (X > XI V X2\Y <y)PiXe dx2) -P{Xe dx2\Y <y)P{X>x^y X2)) ^ (-^ > ^1 ^ x^ [F = y) ^ 

P {X > xi V X2) 

To simplify notation, we now define P {X < x) = G(x), P {X > x) — G(x), P (X e rfx) = 
gix)dx, P{X<x\Y<y) = H (x), P{X > x\Y <y) = i?(x), P {X & dx\Y < y) = h{x)dx, 
P{X <x\Y = y) = F (x), P (X > x|y = y) = F (x) and P (X € dx|y = y) ^ f (x) dx. We 
rewrite (6) as 

(i?(xi VX2)5(2;i) -G(X1 VX2)/l(xi)) • 

(iJ (xi V X2) g (X2) - G (xi V X2) h (X2)) ^ -dxidx2 = 

G(xi VX2) 

2 / / ' (H(x2)g(xi)-G(x2)/i(xi))dxi(i?(x2).g(x2)-G(x2)/i(x2))|^rfx2 = 

JrJ-oc G (X2) 

2 / (i? (X2) G (X2) - G (X2) il (X2)) (H (X2) g (X2) - G (X2) h (X2)) j7^d:J2 = 
JR G(X2) 

2 f {G{x2)-H (x2)) (i? (X2) g{x2)-G{x2)h (x2)) J7^da;2 = 

JR G(X2) 

2 f {G{x2)-H{x2)) (H (x2) 5 (2^2) - G (xz) 5 (X2) + G (X2) g (X2) - G (X2) /i (X2)) ^7^^2:2 = 
Jn G (X2) 



2 / (G {X2) - H (X2)) [H {X2) ~ G {X2)) 9 {X2) ^^dx2 + 

2 / (G {X2) - H {x2)) (5 {x2) - h {x2)) G (xs) ^7^^2:2 = 

Jr G{x2) 

2 / {Gix2}-Hix2)f9{x2)^^dx2 + 

JR G [X2) 

f (G (xa) - H ix2)) (<? (2:2) - h ix2)) I f (z) dzdx2 = 



2 

/R, 



2 / (G(x2)-i/(:r2)f 5(2^2)^^^2:2 + 

JR G [X2) 

2 f f {G {X2) - H (x2)) {g (.T2) - h (.T2)) dx2f {z) dz = 

JR J-00 

(G (X2) - H {X2)f g {X2) ^f^dx2 + f {G {z) - H {z)f f (z) dz 
R G (X2) Ju 

which is non-negative and can only be if G (x) — H (x) for ah x, that is if P {X < x) 
P {X < x\Y < y) for (ahnost) aU x and y, which is equivalent to independence. 



3 Comparison to other approaches 

If one of the variables is binary, our approach leads to the Cramer von Mises test, as described in 
Section 3.1. In Sections 3.2 and 3.3 the two direct competitors to tests based on r* known to the 
authors, one originally by Hoeffding (1948) and another originally by De Wet (1980) and Deheuvels 
(1981), are discussed. Both these competitors share the property of our approach that they are 
rank-based and consistent for all alternatives. 

Useful and extensive discussions of other ordinal data and nonparametric methods for indepen- 
dence testing are given Agresti (2010), Hollander and Wolfe (1999) and Sheskin (2007). 

3.1 The two-sample case and relation to the Cramer von Mises test 

The two-sample Cramer von Mises test is used to test whether or not two samples are drawn from 
the same distribution, and is consistent for any alternative. We show that if one of the variables 
is binary and the conditional distribution of the other variable is continuous, a test based on r* 
coincides with the Cramer von Mises test. We argue that the test based on r* has a possible 
advantage for discrete distributions. 

We now give the relationship with the Cramer von Mises test. Let G be the distribution function 
of U and let H be the distribution function of V . With Fa = aG + (1 — a)H let 

G„ = /(G - HfdFa (7) 

Then Gq, is zero if and only \i G = H , i.e., if and only if X and Y are independent. The Cramer 
von Mises test statistic is based on an estimate of Cp. First, note: 



Lemma 2 For a e R, Ca does not depend on a. 
Proof: The lemma is implied because 



(G - HydH - (G- HydG = (G- Hyd{G -H) = 



1 



\{G-H) 



= 



The relationship between the Cramer von Mises test, which is based on Cp, and r* is given by the 
following: 

Lemma 3 If X is binary and Y given X is continuous, then r* = 6p^(l ~ p^Gp. 

Proof: First note that 



HdH = - 
2 



and 



H^dH^l 



Now continuity implies 

F((7i, [/2 < Vi,V2) + P(Vi, V2 < f/i, U2) + 4P(C/i, Fi < U2, V2) = 1 
Hence by (4) and (8), 

r* = 2p\l-pf 

= 2p^{l-pf 3 f G^HdH + 3 [(l-GfHdH 

3 f{G^ - 2GH + H)dH - - 

3 f{G^ -2GH + H^)dH 

3 f{G - HfdH 



^P{Ui,U2 < 1^1,^2) + ^P{Vi,V2 < Ui, U2) 



2p\\~pf 
2p\X-pf 
2p\\-pf 



(8) 



The lemma now follows from Lemma 2 □ 

Note that, for discrete distributions, the definition of Cp unsatisfactorily depends on the way G 
and H are defined, e.g., whether we define G(m) = P(C/ < m) or G(u) = P{U < u). Since r* deals 
naturally with discreteness of random variables, tests based on t* might serve as an alternative for 
the Cramer von Mises test if discreteness is present. 

3.2 Hoeffding's H 

Hoeffding's (1948) coefficient for measuring deviation from independence for a bivariate distribution 
function F12 with marginal distribution functions Fi and F2 is defined as 



H = /"(F12 - FiF2fdF^2 



10 



(See also Blum, Kiefer, & Rosenblatt, 1961; Hollander & Wolfe, 1999 and Wilding & Mudholkar, 
2008.) An alternative formulation given by HoefFding is 

H = ^Eq}iiXi,X2,X3)cj,{{Xi,X4,X5)^i(Yi,Y2,Y3)(bi(Yi,Yi,Y5) 

where 0(zi, Z2, Z3) = H^i > ^2) - H^i > -^s)- 

Interestingly, Hoeffding's H has an interpretation in terms of concordance and discordance 
probabilities closely related to the interpretation of t* . With 

Fi2ix,y) = PiX<x,Y<y) 

F,^{x,y) = PiX<x,Y>y) = F,{x)-F,2ix,y) 

Fj^{x,y) = P{X>x,Y<y)=F2{y)-Fi2{x,y) 

F^{x,y) = P{X>x,Y>y)^l~Fi{x)~F2{y)+F^2{x,y) 



we have the equality 



F12 - F1F2 ~ F12FJ2 - Fy2Fj2 (9) 



Let five points be i/-concordant if four are configured as in Figure 2(a) and the fifth is on the 
point where the axes cross and, analogously, five points are iif-discordant if four are configured 
as in Figure 2(b) and the fifth is on the point where the axes cross. Denote the probabilities of 
i?-concordance and discordance by Tlc^ and n^)^ . Then 

2'2' 1 

{Fi2Fh + ^15^12) dFi2 = ^nc, = -lie. 



5! ^' 30 



and 



Hence, using (9), 



/■ 11 

/ Fi2F^-^Fj^Fj^dFi2 = -^^Ds = ^20^^^^ 



H^ J{F,2Fj^-F,^F-^fdF,2 = 



2 ,^ 2Hc, -H,5, 



60 

3.3 De Wet and Deheuvels' D 

A coefficient related to Hoeffding's H is 

D = f{Fr2 - FiF2fdFiF2 

Tests based on estimators of this coefficient were studied by De Wet (1980) and Deheuvels (1981). 
Bergsma (2006) showed that in the continuous case, with 

h{zi,Z2,Z3,Zi) = \zi - Z2I + |z3 - Zi\ - \zi -23I - \z2 - Zi\, 

D^Eh{Fi{Xi),Fi{X2),Fi{X3),Fi{Xi))h{F2{Yi),F2{Y2),F2{Y3),F2{Y^)) 

The latter definition is suitable for discontinuous X and Y as well and has the advantage that it 
does not depend on the way F12 is defined. 
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Table 1: Artificial contingency table containing multinomial counts. Kendall's tau and the chi- 
square test do not yield a significant association, but a permutation test based t^ yields p ~ 0.035 

3.4 Comparison of r*, H, and D 

HoefFding's H is more complex than r* in that it is based on concordance and discordance of five 
points rather than four. See also Kruskal (1958), who compares Kendall's tau and Spearman's rho 
which arc based on concordance and discordance probabilities of two and three points respectively 
and for this reason expresses some tentative preference for the simpler Kendall's tau. A possible 
drawback of D is that there seems to be some arbitrariness in the use of rank scores; for example, 
one might also use normal scores. Of the three coefficients, r* may thus have some advantage. 
Tests based on r* are discussed in the next section, which includes a remark on power. 

4 Testing independence 

A suitable test for independence is a permutation test which rejects the independence hypothesis 
for large values of t* . For every permutation tt of the observed y- values, the sample T*-value i* is 
computed, and the p- value is the proportion of the {t^} which exceed t* . As is well-known, the 
permutation test conditions on the empirical marginal distributions, which are sufficient statistics 
for the independence model. In categorical data analysis, it is usually referred to as an exact 
conditional test. In practice, the number of permutations may be too large to compute and a 
random sample of permutations is taken, which is also called a resampling test. Note that there 
doesn't seem to be a need for an asymptotic approximation to the sampling distribution of t* . 

Direct evaluation of t* requires computational time 0{n'^), which may be practically infcasible 
for moderately large samples, but t* can be well-approximated by taken a random sample of subsets 
of four observations. The proof of Theorem 1 suggests that the complexity can be reduced to 0{n^). 
An open problem is what the minimum computational complexity of computing t* is. 

Below, we compare various tests of independence using an artificial and a real data set. 

An artificial multinomial table of counts is given in Table 1, where X and Y are ordinal variables 
with 5 and 7 categories. Visually, we can detect an association pattern, but as it is non-monotonic 
a test based on Kendall's tau does not yield a significant p-value. The chi-square test also yields 
a non-significant p = 0.253 (based on 10^ resamples), while a permutation test based on t* yields 
p = 0.035 (10^ resamples), giving evidence of an association. We also did tests based on D, which 
yields p = 0.045 (10'^ resamples), and the test based on Hoeffding's H yields p = 0.028 (4000 
resamples). In this example, using a test designed for ordinal data with power against broad 
alternatives, evidence for an association can be found, which is not possible with a nominal data 
test like the chi-square test or with a test based on Kendall's tau. 
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Change in size of Ulcer Crater {Y) 

Larger Healed (< |) Healed (> |) Healed 


Treatment group {X) A 
B 


6 4 10 12 

11 8 8 5 



Table 2: Results of study eoniparing two treatments of gastric ulcer 

Table 2 shows data from a randomized study to compare two treatments for a gastric ulcer 
crater, and was previously analyzed in Agresti (2010). Using 10^ resamples, the chi-square test 
yields p = 0.118, Kendall's tau yields p = 0.019, t* yields p = 0.028, D yields p = 0.026, and using 
lO'' resamples HoefFding's H yields p ~ 0.006. 

For future research, more understanding is needed concerning the power of the tests based on t* , 
H, D, and the chi-square test. We have done some limited simulations and, unsurprisingly, found 
that for all four there are alternatives for which they are most powerful. However, so far we were 
unable to detect any patterns which can lead to useful advice on when to use which test; it appears 
we need a better understanding than we currently appear to have of the types of alternatives that 
might be of most interest. 

A Mixing an independence model with a point mass 

Let 0,1 and 0,2 be metric spaces and suppose X S f2i and Y & 0,2 arc independent non-degenerate 
random variables. Consider the mixture of {X, Y) with the degenerate random variable on the 
point (a;o,2/o) G Oi x O2, that is, for some < p < 1 the mixture {X' ,Y') is defined as 



{X',Y') 



{X, Y) with probability p 
{xq, j/o) with probability 1 ~ p 



Then 



Theorem 2 t*{X',Y') > 0. 

Proof: The proof is done by conditioning on the number of occurrences of {xo,ya) among the iid 
{X[,Y{), . . . , (X4, Y^). Clearly, {xq, yo) can occur to 4 times, each with positive probability, and 
T* is the sum of these probabilities times the conditional expectations of the product of 



and 



ign(|X; - 


^2l + l^3- 


~xi\- 


-\x[ 


-^3l 


-\x^ 


-xi 


sign(|y/ 


-Yi\ + \Y; 


-y:\- 


- \YI - 


~n\- 


- 1^2 - 


n\) 



(10) 



(11) 



Conditionally on the number of occurrences of {xo,yo)^ the expectation of the product of (10) 
and (11) equals the product of their expectations. If {xo,yo) occurs 3 or 4 times, both (10) and (11) 
are zero, hence zero is contributed to r*. Conditionally on (xo,yo) occurring or 1 times, the 
expectations of both (10) and (11) can easily be seen to equal zero by symmetry reasons. 

To prove the theorem, it remains to be shown that conditionally on (xo,yo) occurring twice, 
both (10) and (11) have positive expectation. If either {Xi,Yi) and {X4,Y4) or (^2,12) and 
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(X^jY^) equal (a;o,yo)i both (10) and (11) are zero and this wih not contribute to r*. Without 
loss of generality we now only need to consider (X^jY^) and {X4,Y4) equalling {xo,yo). Then (10) 
reduces to 

sign(|Xi - xol + \X2 - xo\ - \Xi - X2\) 

and (11) reduces to 

sign{\Y,-yo\ + \Y2-yo\-\Yi-Y2\) 

By the triangle inequality, both are nonnegative. Since the Xi and Yi arc non-degenerate, both 
have positive probability of being positive and so have positive expectations. Hence r* > 0. □ 

B Mathematica programme for verifying r* > for 3x3 
table 

The code below verifies nonnegativity of t* for 3x3 contingency tables with uniform marginals. In 
the code, we replaced the uniform marginals by a variety of other marginals and always obtained 
the same result. 

(♦module to compute tau-star, where p is an r x c table of probabilities, 
X and y are r x 1 and c x 1 vectors of scores*) 
taustarCD [p_ , x_ , y_] := Module [{sgn, r, c, sxy}, 

sgn = Compile [{{a, _lnteger} , {b, _lnteger}- , {c , _lnteger} , {d, _Integer}} , 

Sign[Abs[a - b] + Abs [c - d] - Abs [a - c] - Abs [b - d] ] ] ; 
{r, c} = {Length [x], Length [y]}; 

sxy = Sum[p[[il,jl]]p[[i2,j2]]p[[i3,j3]]p[[i4, j4]] 
sgn[x[[il]] ,x[[i2]] ,x[[i3]] ,x[[i4]]] 
sgn[y[[jl]] ,y[[j2]] ,y[[j3]] ,y[[j4]]] , 

{il,r>,{i2,r},{i3,r},{i4,r},{jl,c},{j2,c},{j3,c},{j4,c}] 
] 

{r, c} = {3, 3};(*3 rows, 3 columns*) 

(*fixed uniform marginals, can be modified*) 
xmarg = {1/3, 1/3, 1/3}; 
ymarg = {1/3, 1/3, 1/3}; 

(*compute t: tau-star for 3x3 table*) 

pp = Table [p[i, j] , {i, r} , {j , c}] ; 

X = Range [r] ; 

y = Range [c] ; 

t = taustarCD [pp , x, y] // Simplify // PowerExpand // Simplify; 

(*specify assumption of nonnegative probabilities*) 
nonnegprob = Map [# > &, Flatten [pp] ] ; 
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(*define t2 which is t but a function of only (r-l)(c-l) 
probabilities with given marginals*) 

f ixmargl = Table [p [i ,c] ->xmarg[ [i] ] -Sum [p[i,j] ,{j ,l,c-l}] ,{i,l,r}] ; 
f ixmarg2 = Table [p [r,j] ->ymarg[ [j] ] -Sum [p [i,j] ,{i,l,r-l}] ,{j ,l,c}] ; 
t2 = t //. Join[f ixmargl , fixmarg2] // Simplify 

(*check if t2>=0*) 

Simplify [t2 >= 0, Assumptions -> nonnegprob] 

Evaluation gives the result True. 
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